Automatic Selection of Noun Phrases as Document Descriptors in an FCA-Based Information Retrieval System
نویسندگان
چکیده
Automatic attribute selection is a critical step when using Formal Concept Analysis (FCA) in a free text document retrieval framework. Optimal attributes as document descriptors should produce smaller, clearer and more browsable concept lattices with better clustering features. In this paper we focus on the automatic selection of noun phrases as document descriptors to build an FCA-based IR framework. We present three different phrase selection strategies which are evaluated using the Lattice Distillation Factor and the Minimal Browsing Area evaluation measures. Noun phrases are shown to produce lattices with good clustering properties, with the advantage (over simple terms) of being better intensional descriptors from the user’s point of view.
منابع مشابه
Using Noun Phrase Heads to Extract Document Keyphrases
Automatically extracting keyphrases from documents is a task with many applications in information retrieval and natural language processing. Document retrieval can be biased towards documents containing relevant keyphrases; documents can be classified or categorized based on their keyphrases; automatic text summarization may extract sentences with high keyphrase scores. This paper describes a ...
متن کاملAutomatic hypertext information retrieval in a corporate memory using noun phrases in context
In this paper, we describe a method to generate an information retrieval hypertext structure on a large collection of homogeneous documents by generating links only between noun phrases that are pertinent for navigation. Noun phrases are selected by automatic extraction and filtered on the basis of the linguistic context class where they appear, also determined automatically.
متن کاملAutomatic titling of Articles Using Position and Statistical Information
This paper describes a system facilitating information retrieval in a set of textual documents by tackling the automatic titling and subtitling issue. Automatic titling here consists in extracting relevant noun phrases from texts as candidate titles. An original approach combining statistical criteria and noun phrases positions in the text helps collecting relevant titles and subtitles. So, the...
متن کاملRecognising Complex Prepositions Prep+N+Prep as Negative Patterns in Automatic Term Extraction from Texts
This work is a study of the delimitation of complex prepositions (CP) as lexical units, items of a computational lexicon that includes compounds and phrases. In addition, given the utmost importance of spotting noun phrases (NP) in document retrieval systems, parsing prepositional structures such as “Prep1 N Prep2 X” prevents the fragment “N Prep2 X” from being detected as a noun phrase, i.e. t...
متن کاملNoun phrases as building blocks for cross-language Search Assistance
This paper presents a Foreign-Language Search Assistant that uses noun phrases as fundamental units for document translation and query formulation, translation and refinement. The system (a) supports the foreign-language document selection task providing a cross-language indicative summary based on noun phrase translations, and (b) supports query formulation and refinement using the information...
متن کامل